-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add XXXX humanization #88
base: master
Are you sure you want to change the base?
Conversation
(in InternationalizedHumanizer class) |
|
||
// *** if the expression $date->getDay() === null && $date->getMonth() === null | ||
// is superfluous as it seems the function can be restored to its | ||
// original declaration ( int $year, UnspecifiedDigit $unspecifiedDigit ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll, find out by removing the code. We have the tests, and you can add more tests if there are cases you are not sure about.
// is superfluous as it seems the function can be restored to its | ||
// original declaration ( int $year, UnspecifiedDigit $unspecifiedDigit ) | ||
|
||
if ( $unspecifiedDigit->getYear() === 4 && $date->getDay() === null && $date->getMonth() === null ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this 4 about?
If all 3 of these conditions are needed, then extracting into isUnspecifiedYear(): bool
would be nice. Remember: one level of abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think that -5XXXX is possible, which might break your === 4 check.
@@ -33,6 +33,10 @@ public function humanizationProvider(): Generator { | |||
|
|||
yield 'Month only' => [ 'XXXX-12-XX', 'December' ]; | |||
yield 'Day only' => [ 'XXXX-XX-12', '12th' ]; | |||
|
|||
// https://github.com/ProfessionalWiki/EDTF/issues/80 | |||
yield 'Some year' => [ 'XXXX', 'some year' ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add
yield 'Some year' => [ 'X', 'some year' ];
yield 'Some year' => [ 'XXX', 'some year' ];
Curious as to what "-X" would yield.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And check what -5XXXX
results into, unless already tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And check what -5XXXX results into, unless already tested
50000 BC
this is nice, but still doesn't account for the unspecified digits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should perhaps be:
50000 BC (millennium, century and year in decem millennium unspecified)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does not scale. Because what is -5XXXXXXX ? How about 50000 BC (year unspecified)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-5XXXXXXX would be "50 millions years BC" and -5XXXX " 5 decem millennium BC",
so (given that your solution is the only viable at the moment) in general for unspecified digits we could use always the literal expression, that is "5 millenniums BC" instead than "5000 BC (year unspecified)" (with the first notation it is implicit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means implementing scales (https://en.wikipedia.org/wiki/Long_and_short_scales#Short_scale) e.g. the short scale to billion? As (according to Google) the universe is 13 billion years old and will likely exists for 22 billion years and this is about dates. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was checking further the specification here https://www.loc.gov/standards/datetime/
(Unspecified Digit) and the parser,
but I'm not sure that the parser is checking whether
the unspecified digit is placed incrementally from the right of the date component.
e.g.
the following should throw an error:
-XXXXXXX4 (4th year of any millennium BC)
e008d42
to
edb98e4
Compare
yield 'Scales (8 digits)' => [ '5XXXXXXXX', '5 hundreds of millions' ]; | ||
yield 'Scales (9 digits)' => [ '5XXXXXXXXX', '5 billions' ]; | ||
yield 'Scales (10 digits)' => [ '5XXXXXXXXXX', '5 tens of billions' ]; | ||
yield 'Scales (11 digits)' => [ '5XXXXXXXXXXX', '5 hundreds of billions' ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"5 decem millennia"? What does decem
mean here?
"5 hundreds of thousands" is just an amount; should it not also have "years" in there somewhere? Is this even a valid EDTF date to begin with? @mzeinstra
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sometimes the word "year" is missing, I have still to add it where necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"5 decem millennia"? What does
decem
mean here?
"Decem millennium" theoretically means "Ten thousand years" it could be also used the expression "tens of thousands"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tens of thousands is indeed better, but we can also leave that up to the translators.
Cosmology seems to me a valid border of amounts of years that we humanise. Maybe a fallback to just presenting the years in numbers?
The most years I can find in the sciences is the half-life of Xenon, which is 18 billion trillion years :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we should not forget the objective here, that is to make these edtf-string human readable. If 99% of the use cases can be captured than that is ok with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously I also posted this link
https://asistdl.onlinelibrary.wiley.com/doi/epdf/10.1002/pra2.552
in the element.io SemanticMediawiki group because somebody asked about the support for "geological and historical eras".
Regarding the following
Cosmology seems to me a valid border of amounts of years that we humanise. Maybe a fallback to just presenting the years in numbers?
in my opinion if the user registers a date with unspecified digits and the library outputs a human readable format, and additionally computes correctly comparisons and intervals, that's really useful.
Still in my opinion the approach to present a year with mixed unspecified digits using numbers for the specified digits and literals for the unspecified (scaled) part, seems also formally rigorous. So using the example above -5XXXXXXX is not to be precise "50 millions years BC" but "5 tens of millions years BC" (in the output the number represents the specified digits, and the literal part the unspecified digits)
So what about, when this is more refined, to publish a demo (which could query the library using Ajax) and to propose this approach to the authors of the format, the Washington' Library of Congress, as far as I know ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaik there is no standard for the humanisation of EDTF, humanised examples are used on the library of congres website but they are not the owner of the standard.
So your best efforts here will de facto help set the humanisation standard.
@mnhn-paul you might be able to provide us with a standard of humanising eras in EDTF. Eg. What would the date: -5XXXXXXX be humanised to?
Paul is from our museum of natural history and might have more insight into this.
|
||
return $ret + 1; | ||
} | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can apply https://www.entropywins.wtf/blog/2019/01/14/readable-functions-guard-clause/
$ret
is not a good var name
@@ -247,7 +244,55 @@ private function humanizeYearMonthDay( ?string $year, ?string $month, ?string $d | |||
); | |||
} | |||
|
|||
private function humanizeYear( int $year, UnspecifiedDigit $unspecifiedDigit ): string { | |||
private function scaleToMessageKey( int $scale ): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay for putting this in its own method!
|
||
// FIXME: reuse recursively the scale with trillions | ||
// e.g. tens-of-trillions etc., | ||
return 'edtf-tens-of-trillions'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's first make sure we actually need this. I am dubious that these are valid EDTF dates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, the only sensible use-case is in cosmology
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's first make sure we actually need this. I am dubious that these are valid EDTF dates.
actually they are required only for testing purposes, e.g. by an user who wonder what is the output entering something like that 56XXXXXXXXXXXXXXXXXXX
|
||
if ( $date->isBC() ) { | ||
$ret .= " " . $this->message( "edtf-date-BC" ); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concatenating messages like this bakes in an assumption about the order of these messages, which might not hold in all languages. That is why for instance we have messages such as "edtf-qualified-date": "$1 ($2)"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I guessed something like that, so that should be taken into account and enhanced
$ret .= " " . $this->message( "edtf-date-BC" ); | ||
} | ||
|
||
return $ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$humanizedYear
fix #80